Week 04: Joining tables
Combing data tables, and understanding what can go wrong with AI
Week 04: Joining tables
Combing data tables, and understanding what can go wrong with AI
About the class
Data wrangling is when you prepare the data for the analysis. A key aspect is joining data tables. AI can help design the process, and give you code to do that.
Objectives summary:
Learn how to organize data in a tidy way, join multiple datasets, choose variables to answer a research question and create a reproducible workflow to analyze data.
Learning Objectives
- Learn how to store information in a tidy way.
- Work with relational data.
- Join tables
- Use AI to explain complex concepts
Before class
- Background reading: Békés-Kézdi (2021) Chapter 2
- Download
data-modified.zip
from Here. Unzip. It is a set of csv files such as ‘cities_modified’- Also available on Moodle
- The data description is available here
Class Plan
Recap
Discuss assignment 03 (20 mins by groups + 10 mins together) * Create 4-member groups. Each groups will read reports by an another team (1–>2, 2–>3, N–>1) * Read the other team’s submissions with a ‘reader’s perspective’ and take notes. * which report did you like the most and why * rank reports in terms of how much AI was involved from low to high and note suspicious examples
Task 1: Use AI to understand these terms. Ask examples. (Individual)
- tidy data table
- relational datasets,
- schema,
- primary and foreign key
- composite key
- joining tables
- different types of join
- 1:1, 1:m
- joining tables in your language (python, R, Stata)
This is followed by a discussion.
Task 2: Form 2-3 groups of people using same coding language
Use the data you downloaded to carry out joins and inspect results. Use AI but inspect.
1:1
- Join hotels and cities. Compare left, right, inner, outer joins.
- what happens to N?
1:m
Start: Tabulate the frequency of hotels by city_hotel_counts
Cities to Hotels
- one city joins to multiple hotels
- filter on 2 cities for easier visibility
- Join hotel and occupancy 1
- m:1
- Join hotel and occupancy 2
- get hotel level
- trick: aggregate
- Join on composite key
- create a data table at city-year-match level showing average occupancy and tourist arrivals
Advice, ideas
- discuss and collect ideas from AI
- learn to focus on key suggestions (AI can go nuanced and not important points easily)
Home Assignment
Suggested assignment [/assignments/assignment_04]
End of Week Discussion points
- How useful was AI in teaching skills?
- How useful was AI in actually joining tables?
- How can you debug what AI did in terms of executing code?